The discrepancy between mean reward and mean reinforcement
نویسندگان
چکیده
منابع مشابه
Optimized Maximum Mean Discrepancy
We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a ...
متن کاملMaximum Mean Discrepancy Imitation Learning
Imitation learning is an efficient method for many robots to acquire complex skills. Some recent approaches to imitation learning provide strong theoretical performance guarantees. However, there remain crucial practical issues, especially during the training phase, where the training strategy may require execution of control policies that are possibly harmful to the robot or its environment. M...
متن کاملTesting Hypotheses by Regularized Maximum Mean Discrepancy
Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis ...
متن کاملComparison between the Mean Variance optimal and the Mean
5 We compare optimal liquidation policies in continuous time in the presence of trading impact using 6 numerical solutions of Hamilton Jacobi Bellman (HJB) partial differential equations (PDE). In par7 ticular, we compare the time-consistent mean-quadratic-variation strategy with the time-inconsistent 8 (pre-commitment) mean-variance strategy. We show that the two different risk measures lead t...
متن کاملImage Analysis Applications of the Maximum Mean Discrepancy Distance Measure
The need to quantify distance between two groups of objects is prevalent throughout the signal processing world. The difference of group means computed using the Euclidean, or `2 distance, is one of the predominant distance measures used to compare feature vectors and groups of vectors, but many problems arise with it when high data dimensionality is present. Maximum mean discrepancy (MMD) is a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Psychonomic Science
سال: 1965
ISSN: 0033-3131,2197-9952
DOI: 10.3758/bf03343273